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Abstract — Internet worms, which spread in computer net- 
works without human mediation, pose a severe threat to com- 
puter systems today. The rate of propagation of worms has been 
measured to be extremely high and they can infect a large fraction 
of their potential hosts in a short time. We study two different 
methods of patch dissemination to combat the spread of worms. 
We first show that using a fixed number of patch servers performs 
woefully inadequately against Internet worms. We then show that 
by exploiting the exponential data dissemination capability of 
P2P systems, the spread of worms can be halted very effectively. 
We compare the two methods by using fluid models to compute 
two quantities of interest: the time taken to effectively combat 
the progress of the worm and the maximum number of infected 
hosts. We validate our models using Internet measurements and 
simulations. 



I. Introduction 

The advent of malicious mobile code has lead to a paradigm 
shift in Internet security applications. Earlier, computer viruses 
were inherently limited by the fact that human mediation 
was required for them to propagate, which also meant that 
human intervention was sufficient to contain them. However, 
with increased connectivity of computers and availability of 
information regarding vulnerabilities of operating systems and 
applications, there have been several instances of malicious 
code that propagate on their own. Such mobile malicious code 
are now called worms. Interest in worms has been fueled 
by headline-making attacks causing near cessation of Internet 
services, and the names of these worms - such as Code-Red, 
Slammer and Blaster- are now known to most Internet users. 

Measurement studies indicate that worm propagation usu- 
ally follows the classical sigmoid curve as illustrated in 
Figure ^ The figure, which is obtained from [1], shows the 
propagation of the Code-Red (v2) worm measured over the 
duration of 24 hours. There is an exponential growth stage 
followed by a slow finish stage. The worm was programmed 
to switch from an 'infection phase' to an 'attack phase', and 
begin an attack on certain websites at a preselected time. Such 
behavior is by no means unique to Code-Red. The same kind 
of infect-then-attack behavior was observed with the Blaster 
worm [2] as well. The Witty worm deleted small sections of 
the hard-disk contents on the infected hosts and its effects on 
the system were noticeable only over time [3]. The Slammer 
worm was the most benign to the infected host - all it did 
was to spread [4]. However, it caused storms of packets that 
overloaded networks as it spread. 

Worms seen so far have not significantly injured their 
hosts during the time that they spread, since killing their 
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Fig. 1. Graph from [1] based on Internet measurement data, illustrating the 
nature of propagation of the Code-Red (v2) worm. 



host would prevent them from spreading effectively. A host 
might actually be unaware that it is infected, as none of its 
functions are impaired. This fact means that one can deal with 
worm infestations by patching. Hosts that are susceptible to 
the worm as well as those already infected could download 
and install a patch, which has the dual role of eliminating the 
malicious code from the host and closing the hole that enabled 
the infection in the first place. Typically patches are issued 
by either the creator of the OS or by a dedicated anti-virus 
provider. However, given the alarming rate at which worms can 
propagate (the Slammer worm infected more than 90 percent 
of vulnerable hosts within 10 minutes [4]), there has been a 
need to rethink strategies for handling worm attacks. By and 
large, research has focused on three areas - monitoring of 
worms, cutting down the rate of propagation (throttling) and 
delivering patches. 

In our model we have a network of susceptible hosts that 
subscribe to the services of a patch provider. This assumption 
is made on the basis of the fact that major OS creators 
automatically provide subscription to their patching services. 
We assume that the number of infected hosts when the patch 
is released is small as compared to the total number of 
hosts, which is in accord with the fact that so far most 
attacks have happened after a vulnerably has been disclosed. 
Also, worms which exploit previously unknown vulnerabilities 
(zero-day worms) have not been common [5]. Once the patch 
is released, the provider sends an update message (which is 
tiny as compared to the patch) to all hosts which proceed to 
try and download the patch. 

When designing a system for the containment of worms, the 
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main question which comes to mind is that of how long one 
has before the worm goes into 'attack phase'. If this time is 
sufficiently large, one could hope to patch a large fraction of 
computers before the attack occurs. Considering the fact that 
worms have a stage in which their growth rate is exponential, 
even if the worm is slowed down, the time taken to infect a 
large fraction of hosts is likely to be small. In such a case it 
is very possible that a fixed number of patch servers would 
be unable to cope with the spread of the worm. It might then 
be advisable to combine throttling with a peer-to-peer (P2P) 
network that would be used for patch dissemination. 

Related Work 

While the science of epidemiology or the study of causes, 
distribution, and control of disease in populations has been 
of interest to mankind for centuries, the past couple of years 
have seen an large upsurge of interest in the field. Interest in 
the area has stemmed from both computer worm epidemics 
as well as the organic kind. There is now an ever increasing 
body of literature dealing with the measurement, modeling 
and analysis of computer worm propagation and prevention. 
We highlight some important contributions in this area. 

A good deal of work has gone into measuring the spread 
of worms on the Internet [1], [2], [4]. Researchers often try 
to reverse engineer the worm to understand its nature and the 
signature of its attack process. The actual measurement is done 
by means of a network telescope. The idea here is to monitor a 
large fraction of the Internet address space [6], [7]. Abnormal 
activity would register hits on the monitored space. In [8], 
[9], there are ideas on how a P2P network could be used for 
monitoring of abnormal behavior. 

Using simple fluid models [10], it is possible to study 
disease propagation using simple deterministic differential 
equations [11], [12]. In the area of computer worms, initial 
work [13]— [15] largely focused on showing that the epidemi- 
ology model also applies to the spread of computer worms. 
Basic study of defense systems is also present in this work. 
More recently, advanced models of worms, which include fine 
details such as non-uniform scanning rates, as well as ways to 
scale down the network for faster simulation that are accurate 
for certain worms like Slammer, have also been studied [16], 
[17]. 

Defense against worms, either by passive or active means, 
has developed in parallel with the worms themselves. As 
models of worm proliferation have matured, using such models 
to make predictions on the performance of worm containment 
schemes has gained popularity. Some interesting examples of 
such work are [5], [18]. However, they concentrate on the 
number of infected hosts at infinite time, rather than at the 
time at which the attack phase of the worm begins. However, 
an they do not consider the case when infected hosts can 
be patched. As observed in [19], worms seen thus far have 
usually been fairly benign initially to the infected host so as 
to spread quickly, which means that the system operations are 
not significantly compromised. 

Worm activity in an infected computer can be inferred by 
the fact that they tend try to set up new connections at a high 



rate. This behavior immediately suggests a way of slowing 
down the spread of worms. By slowing down the rate at which 
new connections are established, worm applications can be 
retarded. This is the principle behind virus throttling [20], [21]. 
Thus, throttling a virus buys time in which a patch may be 
disseminated in the network. 

P2P networks have been showing ever increasing popularity 
as a means of data dissemination. Internet users are now quite 
familiar with the concept and are well aware of software like 
KaZaa and BitTorrent which implement the idea. Since these 
systems usually have a large number of users, fluid models 
may be used in understanding their performance. Work on 
modeling and analysis of such systems is present in [22]-[24]. 

The P2P idea for worm containment has been considered 
in earlier work. In [25], the authors consider several types of 
worm defense mechanisms, including patching with a fixed 
number of patch servers and different types of "patching 
worms" that duplicate the worm's behavior to disseminate 
patches. Using a graph-theoretic model, they show that the 
patching worms would perform better than a fixed number of 
patch servers. However, the improvement in performance due 
to the patching worms is not quantified using the graph model. 
They also consider the epidemic differential equation models 
to quantify the number of peak scans in the system. In [5], the 
authors conduct an extensive numerical comparison between 
the performance of patching worms and content filtering and 
conclude that the two methods have comparable effects only 
when content filtering covers 89% of the hosts. Along with 
the monitoring aspect, [9] also considers the P2P idea for 
propagating alerts generated by the peers themselves about 
possible worm infestations and perform detailed simulations 
on the fraction of hosts that such a system could save. [18] 
contains an extensive analytical study of worm propagation 
and a cooperative P2P system for patching is considered. 
However, the P2P idea is not thoroughly investigated in this 
work. 

How is our work different? 

The object of our study is to obtain a fundamental insight 
into the propagation of worms under active defense. We use 
the fluid models describing worm scanning and containment 
schemes and solve them to obtain closed-form solutions. Once 
we have the solutions, our focus is on the orders of magnitude 
of parameters (such as worm propagation time, maximum 
number of infected hosts, and patching time) in the system. 
We express our results in terms of three quantities - 

1) The total number of hosts in the system N (a large 
number) 

2) The virulence of the worm denoted by (3 (infections 
per unit time), which is the maximum rate at which the 
worm can spread. 

3) The ratio of the maximum rate of patch propagation to 
worm's virulence denoted by 7 (dimensionless). 

We present our main insights below, starting with a fairly 
obvious one that serves as a benchmark, and proceeding to 
less intuitive ones: 
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• A well designed worm will spread in 0(lniV) time to 
a significant fraction of the hosts. 

While this result seems intuitively clear from the well- 
known exponential phase of worm spreading, it provides a 
useful benchmark to compare different patching schemes. 
Essentially, it says that any action that is taken to contain 
worms would have to be done within a logarithmic time 
frame, since a smart worm would switch to the attack 
phase at this time. 

For example, for a worm like Code-Red with a suscep- 
tible population of about 360,000 hosts, and j3 — 1.8 
infections per hour [14], the value of 4 IniV is about 7 
hours. So if a patching scheme does not patch most of 
the hosts in 7 hours, it is practically useless at dealing 
with it. 

« With a fixed number of patch servers, both the 
maximum number of infected hosts and the time taken 
to disinfect the system are Q(N). 
We show that in the case of a fixed number of patch 
servers, the time at which the infection starts to decay is 
O(miV) and that the number of infected hosts is Q(N) 
at this time. So a fixed number of patch servers has 
practically no effect on the spread of the worm until most 
of the hosts are infected. We also show that the time taken 
to wipe out the infection is Q(N), so it takes a very long 
time for the system to be free of worms. 7 plays almost 
no role in the results. 

In the Code-Red like worm example, if we rely on a fixed 
number of patch servers, even if 7 = 300, in roughly 7 
hours we have an infected population of 200, 000. It takes 
about 25 hours to rid the system of the worm. 

• In P2P system, Q(N~) is the maximum number 
of infected hosts and 0(lniV) is the time taken to 
disinfect the system. 

We show that using P2P patch dissemination, the time 
at which the infection starts decreasing is -Lln(-=), 

the maximum number of infected hosts is Q(Nt) (or 
Q(N) if 7 < 1), and the time taken for the system to be 
worm free is ^1 + ln(iV). Thus, the infection hits 
its peak and vanishes in 0(lniV) time. The value of 7 
can be increased by throttling the worm. For 7 > 1, even 
small increases have a profound effect on P2P systems 
- for instance, a 7 of 2 shows performance of a greatly 
superior order than a 7 of 300 in the fixed number of 
patch servers scheme. 

For the Code-Red like worm example, with 7 being 2, 
the maximum number of infected hosts is of the order 
1000 and the infection both hits its peak and is wiped 
out in about 5 hours - a paradigm shift from the fixed 
number of servers case! 

• The number of hosts to be monitored in order to get 
reliable measurements is ©(j^f) 

We show that if we are to obtain information about the 
worm's presence before it spreads to very many of the 
hosts (which takes 0(ln N) time), i.e., if we would like to 
know that the worm is in the system by In In N time, then 
we have to monitor ©(j^Ty) of the hosts in the system. 



We also show that the same order or higher of hosts must 
be monitored in order to get a reliable estimate of the 
number of infected hosts at any time. 
In the Code-Red like worm case, this means that if we 
want to know about the presence of the worm in the 
time 1//3 lnln(360, 000) = 1.4 hours, we would have 
to monitor about 28, 000 susceptible hosts. For a worm 
that has the whole of the (IPv4) Internet as its prey, with 
2 32 addresses, this number is about 2 27 a phenomenally 
large number! To monitor such a large number of hosts 
one would need either the active participation of network 
operators or of the hosts themselves as a P2P system in 
identifying anomalous traffic. 

Organization of the Paper 

We begin the paper in Section|ll|by reviewing a differential 
equation model for the uniform scanning worm on the lines of 
the classical epidemic model. The model has been solved ear- 
lier, and using the solution we show the exponential spreading 
of the worm. The models and results in the rest of the paper 
are original and form the main contribution. In Section |lll| 
we construct an analytical model of the patching process. We 
create models for both the fixed number of servers and the P2P 
case and solve them. From the solutions we make predictions 
on the performance of the systems in dealing with worms. 
In Section lTVl we provide both measured data and simulations 
illustrating the characteristics of the patching process. We then 
move on to the problem of monitoring the system for worms 
in Section [V] Finally, we conclude with pointers to extensions 
in Section IvTl 

II. Worm Propagation Model 

We first review the simple epidemic model to understand 
worm propagation. Let the number of hosts in the network 
be N. We assume that all hosts are identical in operation and 
that until a host has been patched, it is vulnerable to a worm. 
Let the number of susceptible hosts at time t be denoted by 
S(t). Similarly, let the number of infected hosts at time t be 
denoted by I(t). Then we have that at any time t, 

S(t)+I(t)=N. (1) 

We assume that an infected host scans the address space of 
the network uniformly. This assumption follows from the fact 
that under our model all hosts are identical, and so are equally 
vulnerable to the worm. 

The fluid model is constructed as follows. Consider any one 
infected host. The probability of its choosing a susceptible host 
for infection is Let the average time taken for infecting 
a susceptible host be — . Then if the infected host chooses to 
scan Q hosts in a unit of time, and there are I(t) infected 
hosts performing the same kind of Bernoulli trials, then as 
N — ► 00 the expected number of infected hosts in a unit time 
is Qa I(t) S(t)/N. 

The factor Qa is the maximum number of susceptible hosts 
that an infected host can infect per unit time. We define j3 = 
Qa, which we call the virulence of the worm. In this paper 
we are primarily interested in the order relations of the system 
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with N. So we take the unit of time as the expected time 
taken for an infection (1/(3), which we call infection time 
units (ITU). Note that we may convert ITU to actual time by 
just multiplying by this factor. Then with time measured in 
ITU, the expected number of infected hosts in an ITU is 



A = 



A /(*) S(t) 



N 



(2) 



with A being the rate of infection . 

Now, we assume that the infection process is Markovian, 
with time taken for infection to be exponentially distributed 
with transition rate equal to A, then it can be shown [10] that as 
N — > oo, the fraction of infected hosts i(t) = converges 
to 



i(t)=t(0)+ / (l-i(s))i(s) da, (3) 
Jo 

where we have used Q. We represent the above showing 
explicit dependence on N (in differential form) as 

dm _ mm , „ 

dt N ' W 

where it is understood that N is large. 

The above is identical to the classical simple epidemic 
model [11] and has been used successfully in modeling the 
spread of infectious diseases. It has the closed form solution 

7(0)e* 



7(t) = 



1 



N v 1 



(5) 



The plot of the above expression looks much like Figure ^ and 
it grows exponentially initially and then levels off, yielding the 
classic sigmoidal shape. 

How long does it take for the worm to spread to a large 
number of hosts? 

Given that worms so far either follow a spread-then-attack 
mode of operation or cause gradual damage, it would be 
interesting to know the order of time by which a large number 
of hosts are infected. We could possibly expect an attack (or 
significant damage) to occur at this time. It also gives a rough 
benchmark time at which we can compare the performance of 
different patching schemes. We use the following notation that 
defines a set of functions Q(g(N)). We say f(N) £ Q(g(N)) 
if 3 c\, c-2 and M such that 



ci g(N) < f(N) < c 2 g(N) VA/ > M 



(6) 



Theorem 1: The time by which significant spread of the 
worm occurs is 0(lniV). 

Proof: We would like to know when 7(f) = n N, where 
< k < 1. From (|5jl, we directly have 

7(0)e* = kN- k7(0) (1 - e*) 
N — 7(0) 



t = In- 



1 - K 



■In- 



7(0) 



For fixed n this time is 0(ln A). ■ 
The above result says that the worm spreads exponentially 
fast in any relevant time-frame. We consider an example to 



illustrate what this means. 
Example 

Consider a worm with a virulence of j3 — 1.5 hosts per 
minute and a susceptible population of 85, 000 hosts. It would 
take In N — 11.3 ITU or about 8 minutes to infect a significant 
population. The performance of such a worm is comparable to 
that of a worm like Slammer [4] that spread to 75, 000 hosts 
in 10 minutes. Other worms have much lower rates of spread 
due to poor design of the scanning mechanism. I 

The result also characterizes the time available for counter- 
measures once the worm has appeared. Countermeasures are 
useful only if they can do something about the problem in 
O(lniV) time, otherwise it is a futile activity. We will keep 
this in mind while studying patching schemes. 

III. Patch Dissemination 

The propagation of worms can be halted by fixing the holes 
in the application that allows them to do so. This is the point of 
patching. As mentioned in the introduction, in most instances 
so far a patch has been developed sufficiently quickly that the 
number of infected hosts at the time that the patch is released 
is small, so active defense by patching is possible [5]. Hosts 
must be informed about the availability of the patch, which we 
assume takes a short time since it is a simple update message. 
We are then faced with the second task of ensuring that all 
hosts obtain the patch. Once patched, a host that was infected 
cannot be reinfected by the worm. So the patching process 
reduces both the susceptible and the infected population, and 
eventually the system is worm free. We then have the following 
metrics to characterize any particular method of patching: 

• When does the infection hit is peak, and what is the 
number of infected hosts at this time? 

> How long does it take to end the infection? 
We must answer the above questions keeping in mind the 
fact that the worm might possibly cause significant damage 
at 0(ln A) time. Our emphasis will be on the order relations 
in the system. We will study two possible methods of patch 
dissemination: 

1) A system with a fixed number of patch servers. 

2) A peer-to-peer network. 

The system with a fixed number of patch servers models 
either a dedicated bank of patch servers or that of a content 
distribution network (CDN) with a fixed number of replicas, 
while the P2P system models either a patching worm or a 
CDN that is implemented in a P2P fashion. 

Fixed Number of Patch Servers 

Suppose the creator of the patch has a fixed number of patch 
servers. Both infected and susceptible hosts try to download 
patches from the patch servers. So the question arises whether 
a fixed number of servers can contain the spread of the worm. 
Let the number of servers be P, which is much smaller than 
the total number of hosts present in the network. Let each 
server be capable of disseminating 7 patches in an ITU. In 
other words, the actual maximum rate at which each server 
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can disburse patches is 5 patches per unit time. Then the rate 
at which the servers patches get disseminated is 7P patches 
per ITU, until the number of hosts to be patched is less that 
P. After this point the rate is equal to the number of hosts 
remaining times 7. This finishing phase is irrelevant to our 
study, since the number of hosts patched during this time 
is just P. We now construct the fluid differential equations 
corresponding to the system. 

Let number of patched hosts at time t be denoted by P(t). 
As before, the number of infected and susceptible hosts at 
this time are I(t) and S(t) respectively. Also, the rate at which 
the worm grows is S(t) I(t) /N . However, patching causes the 
number of infectious hosts in the network to decrease. Servers 
disburse patches to both infected as well as susceptible hosts. 
Then the expected number of infected hosts that obtain the 
patch in a unit time is (7 P I(t))/(S(t) + I(t)). In the fluid 
model, this quantity is the rate at which the infection decreases. 
Similarly, the rate at which susceptible population decreases 
is (7 P S(t))/(S(t) + /(*)). However, since the total number 
of hosts in the system is fixed, we can describe the system in 
terms of the infected and patched hosts alone as follows: 



d P(t) 

dt 
d I(t) 



= 1 



dt 



N = 



S(t) I(t) 7 P I(t) 
N S(t) + I(t) 
S(t) + I(t) + P{t) 



(7) 

(8) 
(9) 



The differential equations are valid when number of patched 
hosts is no greater than N — P, which is practically till all the 
hosts are patched since N >> P. We then have the following 
theorem: 

Theorem 2: For the fixed number of servers paradigm, we 
have that the number of infected hosts 



I(t) = 



(N - P - 7 P 



Pt) (expjt-y-xg 
exp (t-T-t-i^i) +C 



7 p t 2 

2N 



(10) 



N-2P ] 
IP 



where C = (N - P)/I(0) G 9(JV) and t G [0, 

Proof: From Q by simple integration, with the initial 
condition P(0) = P, we have 



P{t) = 7 P t + P 



(11) 



So the time at which the number of patched hosts is N — P is 
t = -= — 2P). Now, consider the infection process. From 
l|8} and l|9} we have 



d Ijt) 
dt 



N-P(t)-I(t)\ jPI(t) 



N 



N 



1 



P(i) 
N 



N - P(t) 
7 P 



N — P(t) 



I(t) 



Rearranging the above, we have the following second order 
Bernoulli differential equation 



d lit) _(_ P(t) 



dt 



N 



N - P(t) 



I(t) = 



N 



Substituting V(t) = j^y yields a first order differential 
equation of form 



d V(t) 
dt 



P{t) 



N- ~ N^pW V{t) - ^ (12) 



The solution to H2i is of the form 

V(t) 



■fr J J{t) dt + C 



J(t) 



(13) 



where C is a constant and 



7 P 



N - P(t) 



dt 



P t t P i 2 

(N - P - 7 P t) exp I t 

' 1 1 TV 2N 



(14) 



Here we have used the expression for P(t) from il Q . We now 
need to evaluate J J(t) dt. This is accomplished by simple 
integration using the expression for J(t) from d!4i as follows: 



1 / J(t) dt = 



P yPt\ ( Pt 7Pt 2 \ 

t-N-hr exp ( t -w- — ] '"■ 



Making the substitution q = t— ^rp — 7 , and integrating 



we obtain 



i J J(t) dt = J e« dq 



exp t 



P t 7 Pf 



N 2N 

Thus, O, and O yield the final answer 

ff P t ~tPt 2 \ , r 
y N 2N ) ~ ^ 



V(t) 



exp 



(N-P- 



P - 7 P t) (exp (t - ^ 



7 p t 2 

2N 



(15) 



(16) 



Note that C = (N - P)/7(0) G &(N), as seen by plugging 
int — 0. Noting that I(t) = 1/V(t), we have the proof. ■ 
We see how similar the expression for I(t) looks to (|5}- 
Essentially, the infection progresses unhindered for small t. 
We expect that the effect of patching will not be felt till a fairly 
large number of hosts is infected. We are now ready to answer 
questions regarding its performance. We would first like to 
know when the number of infected hosts hits its maximum 
value. 

Corollary 3: For the fixed number of servers paradigm, the 
number of infected hosts is unimodal and starts decreasing 

when t = 2]n( ~rS ) G 6(ln/V). 

\V1Pn0) J 

Proof: To find out when the number of infected hosts 
starts decreasing, we need to find the time when ^ < 0. In 
order to do this we differentiate d 1 Oft and obtain 

dj_ _ 
~dt ~ 

7P(C + e x ) + ^(/V-P- 7 Pi) 2 V (17) 



(C + e x ) 



6 



where 



X^t 



P t 7 P t 2 
~N 2N 



Setting iJ- < 0, substituting the value of C, and rearranging, 
we get 



M (t) 4 iWiL. 



7 P $ 



1(0) ^ c 



< 1 



(18) 



We observe that for t > 2 In 



N 



we have that 



M(t) < 1 for large N. Thus, for f S QQnN), the number of 
infected hosts starts decreasing. ■ 
Recall that in the system without patching, the time taken 
for infection of a significant population is 0(lniV). So, the 
effect of patching is felt at exactly this time frame. It also 
shows increasing the patching rate 7P has little effect unless 
it is impractically large (comparable to N). So even if the patch 
servers work very fast as compared to the virus, there would 
be no major consequence on the time at which the infection 
decreases. We next consider the question of how many hosts 
are infected at this time. Because the graph is unimodal, this 
is also the time at which the maximum number of hosts is 
infected. 

Corollary 4: For the fixed number of servers paradigm, the 
number of infected hosts is Q(N) for t € 0(lniV). This is 
also the maximum number of infected hosts over all time. 

Proof: Consider dlOi . For t € O(lniV), the number of 
infected hosts is Q(N). ■ 

The above result implies that a fixed number of patch servers 
is simply unable to cope with the spread of a well designed 
worm! In an unpatched system, the worm spreads to <d(N) 
hosts in 0(ln N) time. Thus, as far as the worm is concerned, 
a system with a fixed number of patch servers behaves as if 
practically no patching were occurring up to 0(lniV) time. 
A worm which timed its attack at 0(lniV) time would be 
unstoppable. The next question is that of when the infection 
actually dies down, i.e., how long will it take for the number 
of infected hosts to come down to 0(1)? 

Corollary 5: For the fixed number of servers paradigm, the 
time taken for the number of infected hosts to decrease to 



0(1) is t = 



N-2P 



e Q(N). 



Proof: From il Q , substituting t = 



N-2P 
■yP ' 



we have that 



linijv-,00 I(t) = P € 0(1)- Hence the proof. ■ 
Thus, the infection is contained well after the attack takes 
place. We conclude that patching with a fixed number of 
servers is a futile activity. Clearly, we don't just need a patch 
that kills the worm on contact, but also an efficient distribution 
mechanism that can deal with the worm by creating new 
servers - a P2P system. 

Peer-to-Peer Patch Dissemination 

We have just seen that the fixed number of patch servers 
scheme performs extremely badly in disseminating patches. 
We would like to design a system that matches the worm in 
its capability to proliferate. The obvious solution is to use 



a P2P model. A patch received from a peer would have to 
checked with respect to a hash (sent with the update message, 
for instance) to ensure security of patches. Such a method of 
verification has already been implemented in BitTorrent [26]. 
In the proposed scheme, hosts use a pull mechanism to obtain 
the patch, i.e., they contact hosts at random and ask them if 
they have the patch. If the patch is available, it is downloaded, 
verified, and installed. This mechanism is at variance with the 
push structure of the worm, in which infected hosts contact 
hosts at random and try to infect them. However, there is no 
real difference in the fluid model. 

Construction of the fluid model is similar to what we have 
seen before. Let the number of hosts that initially possess the 
patch be P, which is much smaller than the total number of 
hosts present in the network. Let each host be capable of dis- 
seminating a maximum of 7 patches in an ITU. Note that 7 is 
likely to be smaller than the 7 that we encountered in the fixed 
number of servers case, since the hosts in a P2P system are not 
dedicated patch servers. The rate of patch dissemination looks 
very similar to the rate of worm dissemination that we saw 
in © and is given by ^ (S(t) + /(*)) P(t). Also, while the 
rate at which the worm increases is still S(t) I(t), it now 
decreases at the rate at which infected hosts are patched, which 
is just I(t) P(t). Then we have the following description 
of the system: 



d P(t) 

dt 
d I(t) 



2_ 

N 



+/(<)) P(t) 



dt 



±S(t) I{t) 1 I(t) P(t) 



N = S(t)+I(t)+P(t) 



(19) 

(20) 
(21) 



Our problem is now to solve the above system of equations and 
answer questions regarding the performance of the scheme. 

Theorem 6: For the P2P paradigm, the number of infected 
hosts at time t is given by 



J(t) = 



1 Pe 7t 
iV^ 1 _ I 

1 N 



1 

N 



P 



(22) 



where C = 1/7(0) € 0(1) for large N. 

Proof: The proof technique is similar to the one used 
earlier. We first solve for P(t) using dl9t and (12 1 1 which is 
known to have the solution (of the same form as Q) 



P(t) = 



Pel 1 



l-#(l-e7*) 



(23) 



We then use (I20> and \2\\ to obtain 



d I{t) 
dt 



N-P(t)-I(t)-j P(t) \ I 



N 



r (t) 



P{t) , 1 



N N 



+ (N-{l + 1 )P(t))I{t) 
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Rearranging, we have the following second order Bernoulli 
differential equation 



dm -(i- l -^ P (t)\ m = 



dt 



N 



N 



We convert the above into a first order differential equation 
by substituting V(t) = j^y and obtain 



d V(t) 
dt 



1 - V(t) = i (24) 



As before, the above equation has a closed form solution 
given by 



V(t) 



±JJ(t) dt + C 



J(t) 



(25) 



where C is a constant and 

J(t) = exp 



1 Hr p{t) 1 ,/; 



-i>(/-^ln(l-^(l-e^) 



1-f (1-err*) * 



-7* 



(26) 



Here we have used the expression for P(i) from (I23> . Now, 
in order to obtain the closed form solution, we also require 
jf J J(t) dt. So we proceed to integrate the above expression. 
We have 

1 f If e~ 7 * 

/ J(t) dt = - — dt 

We make the substitution q = e ff and obtain the relation 

dq 



4 / J(t) dt=-^z 
N .1 w jN j /JL 

N 



dt 



1 - W ) '<' 



N 1 - 



v(1 _pU I at' 



(27) 



Then using &25\ , ( I26l > and J27> . and simplifying we obtain 

_ 7> s 



N* l _P + N +Ce \N 



A 



1_ 7v " 



-it 



(28) 



Note that C = 1/1(0) € 0(1) for large iV, as seen by 
plugging in t = 0. Finally using the fact that 7(f) = 1/V(t) 
(by definition) we have the proof. ■ 
The result shows that as expected, the patch spreads expo- 
nentially, directly competing with and destroying the worm. 



We can perform a similar analysis as we did in the fixed 
number of servers case to determine when the infection starts 
decreasing. We have the following result: 

Corollary 7: For the P2P paradigm, the number of infected 
hosts is unimodal and decreases for t > - In ( ^= I € 
e(lniV). 

Proof: As before, the proof is obtained by differentiation. 
Note that V(t) = -^y, hence 

d V(t) -1 d I(t) 



dt 



P(t) dt ' 



which means that we need to find the time at which V(t) 
starts increasing. Differentiating d28i . and setting 
we obtain 



d V(t) 
dt 



> o, 



7Pe 



7 t 



^ 2 



1(0) U 



7 

1(0) \N 



1 



P 

N 



-7* 



7* 



> (29) 



Since the first term is positive, and ■£ is small compared to 
1, a sufficient condition for large N is 



o7* 



P 

N 



,- 7 t 



P 

iV 



> 



1, 

=>• t > - In — 
7 V7^/ 

Note that the first term in (Egl is small for t < - In N. So 

i r — 7 

the condition on t is actually tight for large N. Thus, for 
t > 7 m ("^)' m e number of infected hosts is decreasing. 
Hence the proof. ■ 

The result says that even in the P2P case, it would take 
0(lniV) time for the infection to start decreasing. It also 
says that the time at which the infection starts decreasing is 
unaffected by the initial number of infected hosts, unlike the 
fixed server case. 

The number of infected hosts at this time (which is also the 
maximum) ought to be much lower than in the fixed servers 
case since far more hosts have been patched in this time. We 
show that this is indeed true in the following result: 

Corollary 8: For the P2P paradigm, the maximum number 
of infected hosts is 

G(m) for 7 > 1 

O(N) for 7 < 1. 
Proof: The proof follows directly by substituting t = 
i In (^^p) m (1221 and letting N — > oo. The maximum number 
of infected hosts for 7 > 1 is 



jI(0)N~- 



e 9 (N 



P'< (l+ 7 ) i+ - 

For 7 < 1, we get from i22i that I max G Q(N). Hence the 
proof. ■ 
The above results shows that even a P2P system has limited 
effect in 0(lniV) time if the patching constant 7 < 1. This 
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seems intuitively correct - since the virulence (3 of the worm 
has been normalized to 1, only if 7 > 1 will we observe 
significant reduction in the maximum number of infected 
hosts. The final question is that of when the infection is 
stamped out, i.e., how long does it take for the number of 
infected hosts to become small? 

Corollary 9: For the P2P paradigm, the time taken for 
the number of infected hosts to decrease to 0(1) is t = 

i (i + i)inivee(iniv). 

Proof: The proof follows directly from substituting t = 
i ( 1 + i J In N in (E3 and letting N -> 00. ■ 
Thus, the time at which the infection to start decreasing 
and the time at which it is wiped out are both 0(mAT). Soon 
after the infection hits its peak, it also disappears. If a worm 
were to time its attack at 0(lniV) time, it would only have a 
marginal impact on the network. 

Discussion 

It is interesting to compare the different results we have 
with regard to the effect of patching constant 7 on the time 
at which the infection starts to decrease and the maximum 
number of infected hosts. 

In Corollary[3] 7 appears only within the logarithm. So only 
a 7 that is comparable with N has any real effect. On the other 
hand, in Corollary 7 appears both inside and outside the 
logarithm. Inside the logarithm, it would have to be quite large 
to have any visible effect. However, since it appears outside 
and operates on In TV as well, the effect of even 7 = 2 is 
significant. 

Again, in Corollary |4] we noticed that for any 7 € 0(1), 
the maximum number of infected hosts was <d(N). Increasing 
7 has no effect unless 7 is of Q(N), which is physically 
impossible. On the other hand in Corollary [8] even increasing 
7 by a small amount results in order differences in the 
maximum number of infected hosts. 

So even a small rate of patching by the peers of a P2P 
network has far more impact than an enormous rate of a fixed 
number of servers. The results illustrate the profound impact 
that throttling the worm can have on the system - for a fixed 
number of patch servers throttling is of limited value, but 
in a P2P system throttling gains are magnified enormously. 
Thus, if we use the patch provider's P servers as seed servers 
for distributed patch delivery in a P2P system, we can truly 
achieve outstanding performance - we wipe out the infection 
exponentially fast! 

IV. Experiments 

We use data measured on the Internet along with simulations 
to illustrate the fact that our analytical results, which assumed 
large N, can be used to make fairly good predictions on 
reasonably large systems, and so mirror reality. We consider a 
Code-Red v2 type worm with a virulence = 1.8 infections 
per hour [14] and a susceptible population of 360,000 hosts 
(seen from Figure [Q. The spread of this worm was measured 
in [1] and we obtained the data used in their study courtesy 
of CAIDA (www.caida.org). Our simulations were performed 
by using Simulink to simulate the fluid differential equations. 




Fig. 2. Graph illustrating the performance of a fixed number of patch servers. 
The solid line is measured Internet data, while the dashed line corresponds 
to the fluid model. 



We perform our first experiment on the system with a fixed 
number of patch servers. This was probably the method used 
in handling Code-Red v2, as the rate of patching with time 
was seen to be linear [1], [27], with about 15,000 hosts 
being patched in 8 hours. The data obtained from Internet 
measurement is plotted as a solid line in Figure |2] while the 
dashed line is the simulation. We calculated from the data 
that the patching rate 7P was roughly 7,800 per ITU. We 
assume that P = 25 (this number is not important since only 
7P has an effect on the system) and 1(0) = 25. The zero for 
time was chosen by matching the exponential growth phase of 
the measured data with that of the simulation. Measurement 
stopped when the worm went into attack mode and so stopped 
random scanning. From Corollary [3] we expect the time at 

which the infection hits its peak is t = 21n ( , = ) ITU, 

w 7 p/(o)y 

i.e., about 7.5 hours, which matches fairly well with the graph. 
We also expect from Corollary 0] that the maximum number 
of infected hosts would be of order 10 5 , while the graph 
shows this value as about 2.3 x 10 5 . Finally, we expect from 
Corollary [5] that the infection is wiped out in N ^ p ITU, 
which is about 25 hours. There is no Internet data on this 
number (since measurement stopped during the attack phase), 
but the simulation result matches well with this value. 

We note that a fluid model to explain the behavior of 
Code-Red v2 was studied in [13] and a figure reminiscent of 
Figure[2]was presented there. However, the model in [13] uses 
an epidemic type patch dissemination, which is not directly 
related to the number of patch servers or the ratio of the 
patching rate to the worm propagation rate 7. Our model is 
explicitly in terms of these physically measurable quantities. 
Further, the results in [13] are numerical solutions, while we 
obtain closed-form solutions which allow us to analytically 
predict the performance of different schemes. 

We next perform experiments with the P2P system. Here 
we have no Internet data, since such a system has not been 
implemented. However, we use simulations to illustrate that 
our order results are valid. First we take 7 = 1, P = 10 
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and 1(0) — 25. The results are shown in Figure [3] We 
make use of the Corollaries Q [8] and [9] to find the expected 
numerical values. The expected time at which the infection 
starts reducing is 5.8 hours, which matches well with the 
graph. The number of infected hosts ought to be of the order 
10 5 at this time, and the graph shows a value of 1.1 x 10 5 . 
Finally, the infection ought to end in about 14 hours, which 
matches quite well with the simulation (the tail is difficult to 
see in the figure as the peak is quite high). Notice that even 
with 7=1 the P2P system takes about half the time to wipe 
out the infection as the fixed server scheme. 




2 4 6 8 10 12 14 16 



Fig. 3. Graph illustrating the performance of a P2P system with 7 = 1. 

Our next experiment on the P2P system is to take 7 = 2, 
P = 10 and 1(0) = 25. We wish to illustrate the effect 
of increasing 7 to 2. The results appear in Figure |4] The 
time at which we expect the infection to start decaying is 
2.7 hours, which is approximately what we see in the graph. 
The number of hosts infected at this time should be of order 
10 3 , which compares with 1.8 x 10 3 that we see in the graph. 
Notice that both the time at which decay begins as well as the 
maximum number of infected hosts has shrunk sharply. The 
effect becomes more and more pronounced as 7 is increased. 
Finally, we expect that the infection is over in 5.3 hours, which 
is what we see in the graph. 

The simulations backup our analytical results indicating the 
strength of P2P patching - much lower number of infections 
and a much lower time in which the infection is contained. 

V. HOW MANY HOSTS HAVE TO BE MONITORED? 

So far we have studied the rate of propagation of worms 
and tried to understand the performance of possible patch- 
ing schemes. We now consider a related problem which a 
network operator would be interested in - the number of 
hosts that require monitoring in order to obtain knowledge 
of the existence of the worm. Monitoring in this fashion 
yields data on the distribution of infected hosts and efficacy 
of patching, for instance the data presented in Figures ^ an d 
[2] Thus, monitoring provides information on questions like 
when the infection began, where it originates, how many 
hosts are infected at any time and so on. Monitoring is often 
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Fig. 4. Graph illustrating the performance of a P2P system with 7 = 2. 



carried out by passively observing an unused portion of the 
Internet address space - a so called "Network Telescope" 
[7]. Since the telescope should normally not receive any 
packets, scans directed at it often correspond to worm attacks. 
Another possibility is for the network operator to passively 
monitor a number of real hosts in their address space so as 
to obtain information about anomalous behavior. It has also 
been suggested that a P2P network could be used to identify 
anomalous behavior of hosts [8], [9]. In all cases, we are 
interested in answering the following questions: 

• How many hosts have to be monitored in order to find 
one instance of worm presence in the network in a short 
time? 

• How many hosts have to be monitored in order to find 
out how many infected hosts are present at a given time? 

To answer the first question, we need to understand the 
behavior of the worm just after it comes into its existence. We 
have the following result 

Theorem 10: A worm spreads exponentially fast initially, 
regardless of the patching scheme. 

Proof: The proof follows from (0 and Theorems |2] [6] 
by taking t « N. ■ 

We know that the time taken for effects of patching to show 
up is 0(lnA) (Corollaries |3 Q. We would like to find out 
about the worm before it spreads to too many hosts. Since the 
worm spreads exponentially initially, if we would like to know 
about the worm when the number of infected hosts is 6 (In N), 
we have to ensure that the monitors pick up its presence in 
(In In N) time. The question is that of how many hosts to 
monitor to obtain this information. 

Suppose that we monitor M hosts. Then the probability 
that a particular infected host chooses one of these monitored 
hosts is ^ . The expected number of monitored hosts that are 
scanned in an ITU by all infected hosts is given by M ffi ' . 
So in the fluid model (with the same assumptions as before), 
this value is the rate at which monitored systems are scanned. 
Then we have 

dM(t) MI(t) 

dt N ' K ' 



10 



where M(t) is the total number of scans received by monitored 
hosts in the time interval [0,i]. We now have the following 
result. 

Theorem 11: In order to detect the worm by t S 
0(lnlniV), the number of hosts that have to be monitored 

Proof: The proof is obtained by straightforward integra- 
tion of d30i . We have 

rt 



M 



l(0)e s 



ds 



NJ Q i-m (1 _ eS) 

™ N mfi-M (l _ et) 
TV » ^ v 



A r 



where we have used Q in the second step and M(0) = 0. We 
could equivalently use I(t) — J(0)e* as TheoremllOlsuggests. 

Since we would like M(t) to be of order 1 in order to detect 
the worm at some time t, we have from the above that 



M 



A 



TV In 1 - 



1 - e* 



Then if we are to detect the worm in In In N time, we need 
M = - 



jVm(l-^(l-e lnlnW ) 



which is easily verified to be O (nny). ■ 
The result indicates that as the number of hosts in the system 
increases, the number of hosts to be monitored in order to 
obtain fast information about the worm is extremely large. 

We now consider the second question. Suppose that we want 
to know how many hosts are infected at a particular time. 
We know from the previous section that the amount of time 
required for patching to take effect in the fixed number of patch 
server case is O(lniV). If we assume that most of the infected 
hosts take this long to be patched, then an infected host is 
active for approximately this amount of time. Any one infected 
host would perform t scans in t time (remember that j3 has 
been normalized to 1). So the number of scans received by the 
monitor by a single infected host in 9 (In N) time (assuming 
that the infected host is not patched in the interval) is In N. 
If we set this number equal to 1, i.e., our monitor receives 1 
scan from a particular infected host, we need 



^lniV=l 
A 

In AT 



(31) 



which is identical to the previous result. Thus, the thumb 
rule of monitoring t^jt hosts would give a good estimate 
of the events in the network. However, if a P2P method of 
patching were used, since patching occurs exponentially fast, 
one would have to monitor events at a finer time scale. If we 
assume that the number of infected hosts that were patched 
in time 0(lnlnA r ) is small (it could be a maximum of 
0(ln N) which is small), then the state of the system remains 
relatively constant in this tiny time interval. Proceeding as 



before we get that the number of hosts to be monitored is 
, which is even higher. The implications of the 



now 



N 



In In N 



above results are best illustrated by examples. 
Example 

Consider the Slammer-like worm example, where the total 
number of susceptible hosts is 85, 000 and the virulence 
j3 — 1.5 infections per minute. We saw earlier that a 
significant population was compromised in 8 minutes. Also, 
In In A = 2.42 ITU, which is 1.6 minutes. Thus, if we want 
to know about the worm's presence in 1.6 minutes, we would 
have to monitor 85, 000/ ln(85, 000) » 7, 500 hosts, which is 
close to ^jth of the population. The number of infected hosts 
at this time would be of the order In A = 11 hosts. 

Example 

Consider the current Internet, which largely runs IPv4. 
There are a total of 2 32 addresses present in the system. If 
we would like to know about the presence of a worm in 
In In A = 3 ITU, we would have to monitor 2 32 / In 2 32 » 2 27 
addresses - which is larger than a /8 prefix! To understand 
the events in a P2P system, we need to monitor practically all 
the hosts. I 

Our conclusion is that one cannot hope to achieve mon- 
itoring of such a large fraction of the Internet without the 
active participation of either the network operators or the hosts 
themselves. The numbers strongly support the establishment 
of a P2P monitoring system, with either the network operators 
sharing data on anomalous behavior or by having peers look 
out for abnormal activity such as repeated syns from an 
arbitrary host, and report it to a central monitor that would 
keep records. A system with monitoring based on the lines 
suggested in [8], [9] might be the best way to accomplish this 
goal. 

VI. Conclusions 

In this paper, we have sought to make a convincing case 
for the use of P2P networks for tackling Internet worms. We 
first studied the classical epidemic fluid model in order to 
understand the time scales of events. Using analysis, measured 
data and simulations, we then showed that a fixed number of 
patch servers is incapable of handling an epidemic. We also 
showed that a P2P system is far better suited to handle worm 
outbreaks, both in terms off the maximum number of infected 
hosts, as well as the time taken to wipe out the infection. 
Finally, we considered the issue of monitoring the network and 
showed that the number of monitors required is so large that 
one would need cooperation either among network operators 
or the hosts in the system to obtain reliable estimates. 

We would like to extend our work to include a complete 
stochastic analysis of worms so as to place the fluid models 
on a sound mathematical foundation. We would also like to 
understand the effects of more complex worm models in order 
to include second order effects. 
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